Skip to content

Fix SymbolTable duplicate-name handling during BFBS deserialization#9012

Open
steadytao wants to merge 3 commits into
google:masterfrom
steadytao:fix-9009-avoid-double-free-on-duplicate-enums
Open

Fix SymbolTable duplicate-name handling during BFBS deserialization#9012
steadytao wants to merge 3 commits into
google:masterfrom
steadytao:fix-9009-avoid-double-free-on-duplicate-enums

Conversation

@steadytao
Copy link
Copy Markdown
Contributor

Summary

Reject duplicate enum names during BFBS deserialisation before transferring ownership into the symbol table.

Problem

SymbolTable::Add() appended objects to its ownership vector before checking for duplicate names. During binary schema deserialisation, duplicate enum names could therefore leave a freed pointer behind in the symbol table and later trigger a double-free during teardown.

Fix

  • check for duplicate names before appending the object to the symbol table's ownership vector
  • add a regression test using a malformed BFBS fixture with duplicate enum names
  • verify that VerifySchemaBuffer() accepts the schema but Parser::Deserialize() rejects it safely

Testing

  • built & ran flattests.exe
  • result: ALL TESTS PASSED

Fixes #9009

SymbolTable::Add() appended objects to its ownership vector before checking for duplicate names. During binary schema deserialization this allowed duplicate enum names to leave a freed pointer behind, which was later deleted again during teardown.

Check for duplicates before storing the pointer and add a regression test that verifies a malformed BFBS fixture is accepted by VerifySchemaBuffer() but rejected safely by Parser::Deserialize().

Tested with flattests.exe; all tests passed.
@steadytao steadytao requested a review from dbaileychess as a code owner April 3, 2026 02:35
@github-actions github-actions Bot added the c++ label Apr 3, 2026
jtdavis777
jtdavis777 previously approved these changes Apr 3, 2026
Delete the temporary EnumDef in Parser::StartEnum() when duplicate-name registration fails so duplicate enum/union declarations do not leak during parsing.
@github-actions github-actions Bot added the codegen Involving generating code from schema label Apr 3, 2026
@steadytao
Copy link
Copy Markdown
Contributor Author

CIFuzz found a real follow-up leak in Parser::StartEnum(): duplicate enum/union declarations allocate a temporary EnumDef and then return on duplicate-name rejection without freeing it. I’ve pushed a small follow-up onto this PR to delete that temporary object on the error path.

@steadytao steadytao requested a review from jtdavis777 April 3, 2026 15:21
@kagancapar
Copy link
Copy Markdown

Confirming this fix addresses a broader class of bugs than just SymbolTable<EnumVal>. While fuzzing Parser::Deserialize with a custom OSS-Fuzz harness, I reproduced the exact same root cause across four additional SymbolTable<T> template instantiations, all converging on the vec.emplace_back ordering this PR corrects:

SymbolTable<T> ASan SUMMARY Triggering idl_parser.cpp site
SymbolTable<RPCCall> heap-use-after-free idl.h:242 in ~SymbolTable<RPCCall> line 4283 (calls.Add in ServiceDef::Deserialize)
SymbolTable<ServiceDef> (cascade through ~Parser) same line 4513 (services_.Add in Parser::Deserialize)
SymbolTable<Value> heap-use-after-free idl.h:227 in ~Value line 4028 (attributes.Add in DeserializeAttributesCommon)
StructDef cascade heap-use-after-free idl.h:397 in ~StructDef embedded SymbolTable<FieldDef> field 4149

Plus 17 additional triggering inputs that surface the same root cause as double-free rather than heap-use-after-free (different free order on the destructor cascade), bringing total to 30 distinct triggering inputs in a 4-minute fuzz session.

What this means for the PR

Your one-line fix in idl.h is sufficient for all of the above — they all delegate to SymbolTable<T>::Add regardless of T. The per-callsite cleanup in StartEnum (the delete &enum_def you added at idl_parser.cpp:3128) is a defense-in-depth measure for the enums_ table specifically; the equivalent delete already exists at the other 5 antipattern sites in idl_parser.cpp (lines 4028, 4149, 4156, 4283, 4319, 4513), and your idl.h change makes those delete calls safe.

Suggested addition: broader regression set

If useful, I have minimized PoC bytes for each of the 4 additional manifestations (200 B, 231 B, 389 B, 500 B). Happy to attach them as additional entries in tests/test.cpp next to your DuplicateEnumNameBinarySchemaTest, formatted the same way. Just let me know if you'd prefer me to push them as a follow-up PR or if you'd like to incorporate directly. Either way, this PR's coverage is materially broader than its current title implies.

Companion fuzz target

Separately, I'm preparing a bfbs_parser_fuzzer.cc PR for tests/fuzzer/ so this whole Parser::Deserialize path gets continuous coverage going forward — neither #9009, #9023, nor this PR's bug class would have escaped a baseline fuzz target.

Thanks for the clean fix, looking forward to seeing it land.

@steadytao
Copy link
Copy Markdown
Contributor Author

If useful, I have minimized PoC bytes for each of the 4 additional manifestations (200 B, 231 B, 389 B, 500 B). Happy to attach them as additional entries in tests/test.cpp next to your DuplicateEnumNameBinarySchemaTest

More than happy for either, whichever is most preferred. As Derek may take some time to review, commiting to this PR may be the best choice. I have also confirmed the behaviour you have mentioned and will retitle as fitting.

@steadytao steadytao changed the title Avoid double-free on duplicate enum names during BFBS deserialization Fix SymbolTable duplicate-name handling during BFBS deserialization May 9, 2026
@steadytao
Copy link
Copy Markdown
Contributor Author

@dbaileychess Just bumping this your way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ codegen Involving generating code from schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use-after-free / double-free in Parser::Deserialize via SymbolTable duplicate handling

3 participants